[Feature] Accelerate the comparison for batched by jan-janssen · Pull Request #991 · pyiron/executorlib

jan-janssen · 2026-05-28T22:37:07Z

Summary by CodeRabbit

Refactor
- Updated the batched futures API to accept skip lists as futures instead of pre-materialized lists.
Tests
- Updated test suite to validate the revised skip list parameter handling.

for more information, see https://pre-commit.ci

coderabbitai · 2026-05-28T22:37:19Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9718215e-8c2f-4f4c-aef6-d7c7946a2e43

📥 Commits

Reviewing files that changed from the base of the PR and between f71b624 and dbf6852.

📒 Files selected for processing (3)

src/executorlib/standalone/batched.py
src/executorlib/task_scheduler/interactive/dependency.py
tests/unit/standalone/test_batched.py

📝 Walkthrough

Walkthrough

batched_futures now accepts a list of Future[list] objects as the skip parameter instead of pre-materialized lists. The function dereferences these futures internally, builds a skip set using object identities, and adjusts batch sizing and filtering accordingly. The dependency scheduler and unit tests are updated to use the new signature.

Changes

Batched futures skip parameter refactoring

Layer / File(s)	Summary
Batched futures API update `src/executorlib/standalone/batched.py`	Function signature updated to accept `nested_skip_lst: list[Future[list]]`; implementation dereferences futures via `result()`, builds an internal `skip_set` by collecting `id()` of skipped items, and recalculates batch sizing and filtering logic to use the computed set.
Dependency scheduler integration `src/executorlib/task_scheduler/interactive/dependency.py`	`_update_waiting_task` updated to pass `nested_skip_lst` directly from `task_wait_dict["kwargs"]["skip_lst"]` to `batched_futures`, deferring future dereferencing and set construction to the API function.
Tests updated for nested_skip_lst `tests/unit/standalone/test_batched.py`	Both `test_batched_futures` and `test_batched_futures_not_finished` updated to construct `Future` objects for skip items and call `batched_futures` with `nested_skip_lst` parameter; assertions adjusted to match new API behavior.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

pyiron/executorlib#1013: Both PRs modify batched_futures' skip-filtering logic to build an internal set of skipped items' identifiers and use it to exclude already-skipped completed future results.
pyiron/executorlib#756: The main PR updates batched_futures's skip-list contract and adjusts _update_waiting_task's "batched" branch, directly modifying the batching implementation introduced in that PR.

Suggested reviewers

samwaseda

Poem

🐰 A futures dance, now deferred with grace,
Skip-sets bloom where futures embrace,
Nested lists become promises untouched,
Until result() is called—not too much,
Batched and bound in identity's place!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Accelerate the comparison for batched' is vague and does not clearly describe the actual change, which is converting skip_lst parameter from list to set for performance optimization.	Consider a more descriptive title like 'Convert batched_futures skip parameter from list to set' or 'Use set-based lookup for batched_futures skip comparison' to better communicate the specific technical change.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fast_batched

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/executorlib/standalone/batched.py (1)
19-25: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Fix batched_futures: value-deduped skip_set can overestimate n_expected and leave later batches permanently unresolved when duplicate result values occur.

In src/executorlib/standalone/batched.py, n_expected = min(n, len(lst) - len(skip_set)), but eligibility is computed via v.result() not in skip_set (value-based). In src/executorlib/task_scheduler/interactive/dependency.py, skip_set is built as a deduplicated set of prior batch results, so duplicates in lst that were already consumed by an earlier batch still remove all remaining occurrences by value, while n_expected only subtracts once per distinct value.

When there aren’t enough remaining non-skipped values to fill the batch, batched_futures keeps returning [], and _update_waiting_task keeps the batch waiting forever.

Concrete example (single value repeated in the consumed batch):

lst results: [1, 1, 1, 2], n=3

Batch 1 returns [1, 1, 1]

Batch 2: skip_set={1}, so n_expected=min(3, 4-1)=3, but only the single future with result 2 is eligible → returns [] indefinitely.

Add tests with duplicate result values (and/or change the algorithm to track already-assigned counts or per-future identity rather than a value-deduped set).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/executorlib/standalone/batched.py` around lines 19 - 25, batched_futures
overestimates n_expected because skip_set is value-deduped; change
batched_futures (and its use of skip_set) to account for duplicate result values
by tracking counts or per-future identity: either accept a skip_counts Counter
(value -> number skipped) or build a skip_futures set of already-assigned
futures, then compute n_expected as min(n, count of futures in lst not excluded
by the skip tracking) and when collecting done results decrement the skip count
or mark that specific future as skipped; update references to skip_set,
n_expected, lst and done_lst accordingly and add tests for duplicate-result
scenarios.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/executorlib/task_scheduler/interactive/dependency.py`:
- Around line 349-353: The current batching builds a value-based skip_set by
flattening f.result() into a set which fails for unhashable batch outputs;
change to an identity/position-based skip key so membership checks don't require
hashing the full result. In
src/executorlib/task_scheduler/interactive/dependency.py replace the flattened
value set with a set of identity keys (e.g. {id(item) for f in
task_wait_dict["kwargs"]["skip_lst"] for item in f.result()} or a tuple of
(future_index, item_index) keys) and update the corresponding membership test in
src/executorlib/standalone/batched.py (the v.result() not in skip_set check) to
compare the same identity/position key (e.g. id(v.result()) or (v_future_index,
v_item_index)) instead of the raw value so unhashable batch items no longer
raise TypeError.

---

Outside diff comments:
In `@src/executorlib/standalone/batched.py`:
- Around line 19-25: batched_futures overestimates n_expected because skip_set
is value-deduped; change batched_futures (and its use of skip_set) to account
for duplicate result values by tracking counts or per-future identity: either
accept a skip_counts Counter (value -> number skipped) or build a skip_futures
set of already-assigned futures, then compute n_expected as min(n, count of
futures in lst not excluded by the skip tracking) and when collecting done
results decrement the skip count or mark that specific future as skipped; update
references to skip_set, n_expected, lst and done_lst accordingly and add tests
for duplicate-result scenarios.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9a5e9403-9861-440d-ac4c-96e84ef390c4

📥 Commits

Reviewing files that changed from the base of the PR and between ecdee72 and 6cf9104.

📒 Files selected for processing (2)

src/executorlib/standalone/batched.py
src/executorlib/task_scheduler/interactive/dependency.py

jan-janssen · 2026-05-28T22:48:06Z

@copilot fix the following two errors in the tests:

======================================================================
ERROR: test_batched_futures (unit.standalone.test_batched.TestBatched.test_batched_futures)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/executorlib/executorlib/tests/unit/standalone/test_batched.py", line 13, in test_batched_futures
    self.assertEqual(batched_futures(lst=lst, n=3, skip_lst=[]), [0, 1, 2])
                     ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: batched_futures() got an unexpected keyword argument 'skip_lst'. Did you mean 'skip_set'?

======================================================================
ERROR: test_batched_futures_not_finished (unit.standalone.test_batched.TestBatched.test_batched_futures_not_finished)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/executorlib/executorlib/tests/unit/standalone/test_batched.py", line 23, in test_batched_futures_not_finished
    self.assertEqual(batched_futures(lst=lst, n=3, skip_lst=[]), [])
                     ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: batched_futures() got an unexpected keyword argument 'skip_lst'. Did you mean 'skip_set'?

----------------------------------------------------------------------

…ture

Copilot · 2026-05-28T22:50:18Z

@copilot fix the following two errors in the tests:

...

Fixed in the latest commit. Updated the tests to use skip_set=set() / skip_set={...} (flat set) instead of skip_lst=[] / skip_lst=[[...]] to match the batched_futures function signature.

coderabbitai · 2026-05-28T22:52:07Z

Actionable comments posted: 0

coderabbitai · 2026-05-28T22:53:32Z

Actionable comments posted: 0

Updated versioning system from vcs-versioning to setuptools-scm and modified version details.

codecov · 2026-05-28T23:10:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.19%. Comparing base (fd8b9a9) to head (dbf6852).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #991   +/-   ##
=======================================
  Coverage   94.19%   94.19%           
=======================================
  Files          39       39           
  Lines        2103     2103           
=======================================
  Hits         1981     1981           
  Misses        122      122

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for more information, see https://pre-commit.ci

jan-janssen and others added 2 commits May 28, 2026 16:36

[Feature] Accelerate the comparison for batched

7101168

[pre-commit.ci] auto fixes from pre-commit.com hooks

4d8052c

for more information, see https://pre-commit.ci

Merge branch 'main' into fast_batched

6cf9104

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Comment thread src/executorlib/task_scheduler/interactive/dependency.py Outdated

Copilot started work on behalf of jan-janssen May 28, 2026 22:48 View session

Fix test parameter name: skip_lst -> skip_set with correct data struc…

85682bd

…ture

Copilot finished work on behalf of jan-janssen May 28, 2026 22:50

Format black

f71b624

Change versioning method and update version info

21ba863

Updated versioning system from vcs-versioning to setuptools-scm and modified version details.

jan-janssen marked this pull request as draft May 28, 2026 23:02

jan-janssen and others added 8 commits June 10, 2026 17:35

Merge branch 'main' into fast_batched

326aad6

Update dependency.py

210b084

Update test_batched.py

b1306c4

revert to nested list

f3f4ed6

[pre-commit.ci] auto fixes from pre-commit.com hooks

ee9a627

for more information, see https://pre-commit.ci

fixes

1a26e90

fix type hint

10f4f3c

nested list with future

dbf6852

jan-janssen marked this pull request as ready for review June 10, 2026 17:55

jan-janssen merged commit 65f1ef3 into main Jun 10, 2026
90 of 94 checks passed

jan-janssen deleted the fast_batched branch June 10, 2026 17:55

coderabbitai Bot mentioned this pull request Jun 11, 2026

[Feature] Check which future objects are skipped and keep track of their IDs #1014

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Accelerate the comparison for batched#991

[Feature] Accelerate the comparison for batched#991
jan-janssen merged 14 commits into
mainfrom
fast_batched

jan-janssen commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

jan-janssen commented May 28, 2026

Uh oh!

Copilot AI commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

codecov Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jan-janssen commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jan-janssen commented May 28, 2026

Uh oh!

Copilot AI commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

codecov Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jan-janssen commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading

codecov Bot commented May 28, 2026 •

edited

Loading